Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach
نویسندگان
چکیده
Approximate linear programs (ALPs) are well-known models for computing value function approximations (VFAs) for high dimensional Markov decision processes (MDPs) arising in business applications. VFAs from ALPs have desirable theoretical properties, define an operating policy, and provide a lower bound on the optimal policy cost, which can be used to assess the suboptimality of heuristic policies. However, solving ALPs near optimally remains challenging, for instance, in applications where the MDP includes cost functions or transition dynamics that are nonlinear or when rich basis functions are required to obtain a good VFA. We address this tension between ALP theory and solvability by (i) proposing a saddle point based reformulation of an ALP that endogenizes a state-action density function as a dual decision variable to avoid non-convexities, and (ii) developing a solution approach, ALP-Secant, that combines root finding and saddle point methods to solve this reformulation. We establish that ALP-Secant returns a near optimal ALP solution and a lower bound on the optimal policy cost with high probability in a finite number of iterations. We numerically compare ALP-Secant with the commonly used constraint sampling approach to solve ALP and a look-ahead heuristic on inventory control and energy storage applications, where using row generation is not a viable option. We find that ALP-Secant is more effective than constraint sampling for solving ALPs and delivers high quality policies and lower bounds, with its policies outperforming those from the other two heuristics. Our ALP reformulation and solution approach broaden the applicability of approximate linear programming.
منابع مشابه
A revisit of a mathematical model for solving fully fuzzy linear programming problem with trapezoidal fuzzy numbers
In this paper fully fuzzy linear programming (FFLP) problem with both equality and inequality constraints is considered where all the parameters and decision variables are represented by non-negative trapezoidal fuzzy numbers. According to the current approach, the FFLP problem with equality constraints first is converted into a multi–objective linear programming (MOLP) problem with crisp const...
متن کاملUsing finite difference method for solving linear two-point fuzzy boundary value problems based on extension principle
In this paper an efficient Algorithm based on Zadeh's extension principle has been investigated to approximate fuzzy solution of two-point fuzzy boundary value problems, with fuzzy boundary values. We use finite difference method in term of the upper bound and lower bound of $r$- level of fuzzy boundary values. The proposed approach gives a linear system with crisp tridiagonal coefficients matr...
متن کاملExact and approximate solutions of fuzzy LR linear systems: New algorithms using a least squares model and the ABS approach
We present a methodology for characterization and an approach for computing the solutions of fuzzy linear systems with LR fuzzy variables. As solutions, notions of exact and approximate solutions are considered. We transform the fuzzy linear system into a corresponding linear crisp system and a constrained least squares problem. If the corresponding crisp system is incompatible, then the fuzzy ...
متن کاملSolving Fractional Programming Problems based on Swarm Intelligence
This paper presents a new approach to solve Fractional Programming Problems (FPPs) based on two different Swarm Intelligence (SI) algorithms. The two algorithms are: Particle Swarm Optimization, and Firefly Algorithm. The two algorithms are tested using several FPP benchmark examples and two selected industrial applications. The test aims to prove the capability of the SI algorithms to s...
متن کاملA New Mathematical Approach based on Conic Quadratic Programming for the Stochastic Time-Cost Tradeoff Problem in Project Management
In this paper, we consider a stochastic Time-Cost Tradeoff Problem (TCTP) in PERT networks for project management, in which all activities are subjected to a linear cost function and assumed to be exponentially distributed. The aim of this problem is to maximize the project completion probability with a pre-known deadline to a predefined probability such that the required additional cost is min...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017